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ABSTRACT 

Previously, Lrp-like transcriptional regulator LysM 
from the hyperthermoacidophilic crenarchaeon 
Sulfolobus solfataricus was proposed to have a 
single target, the lysWXJK operon of lysine biosyn- 
thesis, and a single effector molecule, L-lysine. Here 
we identify 70 novel binding sites for LysM in 
the S. solfataricus genome with a LysM-specific 
nanobody-based chromatin immunoprecipitation 
assay coupled to microarray hybridization (ChlP- 
chip) and in silico target site prediction using an 
energy-based position weight matrix, and validate 
these findings with in vitro binding. LysM binds to 
intergenic and coding regions, including promoters 
of various amino acid biosynthesis and transport 
genes. We confirm that L-lysine is the most potent 
effector molecule that reduces, but does not com- 
pletely abolish, LysM binding, and show that several 
other amino acids and derivatives, including 
D-lysine, L-arginine, L-homoarginine, L-glutamine 
and L-methionine and branched-chain amino acids 
L-leucine, L-isoleucine and L-valine, significantly 
affect DNA-binding properties of LysM. Therefore, 
it appears from this study that LysM is a much 
more versatile regulator than previously thought, 
and that it uses a variety of amino acids to sense 
nutritional quality of the environment and to 
modulate expression of the metabolic machinery 
of Sulfolobus accordingly. 

INTRODUCTION 

The Leucine-responsive regulatory protein (Lrp) family of 
transcriptional regulators is one of the largest families of 
bacterial/archaeal regulators (1-3). Lrp-like regulators 



exhibit a similar architecture (4-9). The N-terminal 
DNA-binding domain bears a winged helix-turn-helix 
motif; the C-terminal Regulation of ^mino acid 
Metabolism (RAM) domain shows a characteristic ap 
sandwich fold (paPPaP) (5) and is involved in effector- 
binding and oligomerization. Intriguingly, despite these 
structural similarities, bacterial and archaeal Lrp-like 
regulators modulate activity of two different transcription 
machineries. Bacterial Lrp-like regulators modulate the 
initiation frequency of a unique RNA polymerase that 
directly binds core promoter elements (10). In contrast, 
archaeal members regulate the activity of a eukaryotic-like 
transcription apparatus that consists of a TATA box, 
transcription factor B responsive element (BRE) and ini- 
tiator (Inr); three general transcription factors [TATA 
binding protein (TBP), transcription factor B and tran- 
scription factor E] and a complex RNA polymerase that 
is most homologous to eukaryotic RNA polymerase II 
(11-13). As in eukarya, the unique archaeal RNA poly- 
merase is recruited by protein-protein interactions with 
transcription factors. The fundamental differences 
between the bacterial and archaeal transcriptional ma- 
chinery appeal for a different mode of action of bacterial 
and archaeal Lrp-like regulators, in particular concerning 
transcriptional activation. 

Escherichia coli Lrp, archetype of the Lrp-like family, is 
a global regulator that directly or indirectly affects a pleni- 
tude of metabolic pathways (14-16) and was recently 
shown to use several amino acids as cofactors (17). This 
regulator is proposed to play an important role in coord- 
ination of the metabolic switch of microorganisms on 
transitions between regimes of feast and famine (18,19). 
In contrast, most other bacterial Lrp-like regulators act 
more specific, with one or a few targets only, and are 
generally involved in control of amino acid metabolism. 
Much less is known about archaeal members. Frequently, 
their cofactor has not yet been identified, targets apart 
from the control region of their own gene (autoregulation) 
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are rarely known and molecular mechanisms of regulatory 
process have not been unravelled [for a review see (2)]. 
Nevertheless, a few case studies, in particular with 
regulators from model organisms Sulfolobus solfataricus 
(crenarchaeote) and Methanocaldococcus jannaschii 
(euryarchaeote), indicate that archaeal Lrp-like regulators 
appear to be more versatile than their bacterial counter- 
parts and that they are also involved in regulation of 
central metabolism (20,21). 

LysM is an Lrp-like regulator from hyperthermoa- 
cidopilic crenarchaeon 5. solfataricus that was previously 
shown to bind the lysWXJK control region and was 
proposed to act as a lysine-sensitive co-activator (22). 
Both LysM and its target sequence in the lys W operator 
are highly conserved among Sulfolobus genomes. 
However, a high-resolution contact map of LysM 
operator DNA contacts has not been established, DNA 
sequence specificity of LysM has not been thoroughly 
studied and, besides lysW, no additional target genes 
have been identified to date. 

To gain further insights into LysM action and its 
physiological role, we established a high-resolution 
contact map of LysM-lysW promoter/operator inter- 
actions and performed a saturation mutagenesis of the 
symmetrical LysM consensus box. Furthermore, we 
identified additional LysM-binding sites in the genome 
of S. solfataricus and validated these data with in vitro 
DNA-binding assays for a subset of selected targets. The 
data presented here clearly demonstrate that LysM binds 
with high affinity to several additional binding sites in the 
Sulfolobus genome both in vivo and in vitro. Furthermore, 
we show that LysM has a rather broad ligand-binding 
specificity and that several amino acids besides L-lysine 
significantly affect its DNA-binding capacity. 

MATERIALS AND METHODS 

Strains and growth conditions 

E. coli strain DH5a was used for all cloning and plasmid 
propagation purposes. E. coli BL21(DE3) was used 
as a host for overexpression of recombinant proteins. 
S. solfataricus P2 (DSM1617) and Sulfolobus 
acidocaldarius (DSM639) were grown aerobically at 
80°C and 75°C, respectively, in Brock basic medium (23) 
supplemented with 0.2% sucrose, with or without L-lysine 
(5mM), as indicated. 

DNA and RNA extractions 

Genomic DNA (gDNA) from S. solfataricus P2 and 
S. acidocaldarius was extracted from 4 ml of a culture 
with a QuickPick SML gDNA kit (BioNobile). Plasmid 
DNA was extracted from transformed E. coli DH5oc with 
a Miniprep kit (Qiagen). For RNA extraction, 1 ml of an 
exponentially grown S. solfataricus P2 culture [optical 
density (OD 600 nm ) of 0.3] was mixed with 2 ml 
RNAprotect Bacteria Reagent (Qiagen) and centrifuged. 
Pelleted cells were subsequently lysed with proteinase K 
(Qiagen), and RNA was extracted with an RNeasy mini 
kit (Qiagen). RNA samples were mixed with 10 U of 
DNase I (Roche) and incubated for 20min at 37°C to 



remove any contaminating gDNA. DNase I was 
removed with the RNeasy mini kit according to the 
clean-up procedure. All samples were analysed by 
end-point polymerase chain reaction (PCR) with primers 
DC1115f and DC1116r (Supplementary Table SI) to 
confirm absence of gDNA. 

Plasmid constructions and DNA manipulations 

To construct plasmid pET24lysM Sa -6xhis for 
overexpression of C-terminal 6xHis-tagged S. acido- 
caldarius LysM (LysM Sa ), the open reading frame 
(ORF) region of Saci_0752 was PCR-amplified using 
gDNA as template and primers DC689f and DC690r. 
The amplicon was digested with Ndel and Xhol and 
ligated into kanamycin-resistant expression vector 
pET24a (Novagen) digested with the same enzymes. 
Vector pBendLYSM for the circular permutation assay 
was obtained by ligating the annealed oligonucleotides 
DC826f and DC827r bearing sticky Xbal sites into 
pBend2 (24) digested with the same enzyme. All constructs 
were verified by DNA sequencing. All oligonucleotides 
used in this work are listed in the Supplementary Data 
(Supplementary Table SI). 

Production and purification of recombinant LysM from 
S. solfataricus and S. acidocaldarius 

Untagged recombinant S. solfataricus LysM (LysM Ss ) was 
produced in E. coli BL21(DE3) cells transformed with 
plasmid pLUW632 (22). Induction was with 1 mM isopro- 
pyl-P-D-thiogalactopyranoside at a cell density of 
9xl0 8 ml _1 , followed by overnight growth at 25°C. 
LysM Ss was purified as described (25) with two modifica- 
tions: after harvesting by centrifugation, cells were 
sonicated for 15min at 20% of maximal amplitude in a 
VibraCell® sonicator equipped with a continuously cooled 
cell; purified protein was dialyzed overnight at 4°C against 
LysM storage buffer composed of 20 mM Tris (pH 8.0) 
and 20% glycerol. 

We purified recombinant C-terminal 6xHis-tagged 
LysM from 5. acidocaldarius (LysM Sa ) from E. coli 
BL21(DE3) cells containing plasmid pET24-lysM Sa - 
6xhis. LysM Sa overexpression was induced by adding 
1 mM isopropyl-P-D-thiogalactopyranoside at a cell 
density of 9 x 10 8 ml _1 , followed by overnight growth at 
30°C. Cells were collected by centrifugation, resuspended 
in 6 ml of binding buffer (20 mM phosphate buffer, 0.5 M 
NaCl and 40 mM imidazole, pH 7.4) and followed by son- 
ication and centrifugation (lOmin at 7000 rpm in a Jouan 
centrifuge with AB50.10A rotor). Soluble extract 
was heated at 80° C during lOmin and subsequently 
centrifuged to remove denatured proteins. Harvested 
supernatant was loaded on a HisTrap™ FF 1ml 
column (GE Healthcare) operated by an AKTA™ fast 
protein liquid chromatography system (GE Healthcare). 
The column was extensively equilibrated with binding 
buffer before application of a linear gradient with 
elution buffer (binding buffer with 500 mM instead of 
40 mM imidazole). Peak fractions were analysed by 
sodium dodecyl sulphate-polyacrylamide gel electrophor- 
esis and electrophoretic mobility shift assay (EMSA), and 
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then collected and dialyzed against LysM storage buffer 
before storage at — 80°C. In contrast to N-terminally 
His-tagged LysM Ss (22), purified C-terminally His- 
tagged LysM Sa did not precipitate and was correctly 
folded, as indicated by DNA-binding activity and 
cofactor response. All LysM Ss and LysM Sa protein con- 
centrations are expressed in monomer equivalents. 

In vitro DNA binding: EMS As, in-gel OP-Cu 
footprinting and pre-modification binding interference 
analyses 

EMSAs were performed as described (26), either with 
gel-purified single 5'-end 32 P-labelled PCR fragments or 
with purified 47-bp duplexes generated by annealing of 
complementary oligonucleotides (Supplementary Table 
SI), of which one was 5'-end labelled with [7- P]-ATP 
by T4 polynucleotide kinase. Unless otherwise stated, sep- 
aration of free DNA from DNA-protein complexes was 
performed on 6 and 8% polyacrylamide gels for the PCR 
fragments and 47-bp duplexes, respectively. LysM-binding 
reactions were performed in LrpB binding buffer as 
described previously (27) with 25ugml~ sonicated 
herring sperm DNA as non-specific competitor. 

EMSA autoradiographs were scanned with a Microtek 
Bio-5000 scanner, and binding equilibrium association 
constants (K A s) were determined with the Densitometric 
Image Analysis Software, for which a description will be 
published elsewhere. To enable comparison of K A s 
measured at different times, relative K A s were determined 
by normalization with the K A for binding to the consensus 
sequence fragment, measured each time in parallel. In-gel 
footprinting of separated LysM-DNA complexes with the 
1,10-phenanthroline-copper [(OP) 2 -Cu + ] ion (Cu-OP) was 
performed as described (28). Missing contact and pre- 
methylation binding interference experiments were per- 
formed as described previously (29), using LrpB binding 
buffer to perform binding reactions. Reference ladders 
were generated by chemical sequencing methods (30). 

DNA bending test 

Circular permutation assay (24,31) was performed with a 
set of six fragments of identical length (1 56 bp) bearing the 
15-bp consensus LysM binding site at various distances 
from the extremities. Fragments were generated by 
PCR amplification with pBendLYSM plasmid DNA as 
template and oligonucleotide pairs DC826f-EP31r 
(fragment I), EP15-EP16r (fragment II), EP17-EP18r 
(fragment III), EP9-EP10r (fragment IV), EP19-EP20r 
(fragment V) and EP21-EP22r (fragment VI) as primers. 
The LysM binding site is located closest to an extremity 
on fragments I and VI, and approximately in the middle in 
fragment IV. The apparent bending angle was calculated 
from relative mobilities of complexes on 8% polyacryl- 
amide, as described (31). 

Real-time quantitative PCR 

First-strand cDNA was synthesized from 30 ng RNA with 
Superscript® III First-Strand Synthesis SuperMix kit 
(Invitrogen), according to the manufacturer's instructions. 
Quantitative real-time PCR (qPCR) was carried out in a 



Bio-Rad iCycler with iQ 1M SYBR® Green Supermix 
(Bio-Rad) using following amplification protocol: initial 
denaturation at 95°C for 3min followed by 40 cycles of 
95°C for 10 s and 55°C for 30 s, and one cycle of 95°C for 
1 min and 55°C for 1 min. Reactions were performed with 
12.5 ul SYBR Green supermix and 20-fold diluted cDNA 
in a total volume of 25 ul. Amplification reactions were 
performed in technical duplicates on biological quadrupli- 
cates, with a reaction without template as negative 
control. Specificity was verified by melt curve analysis. 
Sso0951, encoding TBP, was used as a reference gene 
for normalization. Of four genes that were tested (TBP, 
mini chromosome maintenance, 23 S rRNA and RNAP 
subunit B), TBP proved to have the most stable expression 
in tested conditions. Quantification cycles (C q s) were 
determined with Bio-Rad iQ5 software. Efficiencies of 
gene-specific primer pairs were calculated by determining 
the slope of a linear regression curve resulting from C q 
values for a 10-fold dilution series with gDNA as a 
template. Relative expression ratios were calculated by 
integrating knowledge of primer pair efficiencies (32). 

Generation of LysM-specific nanobodies 

LysM-specific nanobodies were generated by immunizing 
an alpaca (Vicugna pacos) with purified full-length 
6xHis-tagged LysM Sa . A total of six injections at weekly 
intervals were given, each with 200 ug protein. Plasma 
obtained 4 days after the last injection showed an end 
titre of ~10 4 . Subsequently, using peripheral blood 
lymphocytes, a variable domain of heavy chain antibodies 
library was constructed as described (33). The LysM Sa - 
directed nanobodies were generated according to the pre- 
viously described bio-panning procedure (33). A chroma- 
tin immunoprecipitation (ChlP)-grade nanobody that 
does not disrupt LysM Sa - and LysM Ss -containing 
protein-DNA complexes was selected by enzyme-linked 
immunosorbent assay and EMSA (data not shown). The 
specificity of the nanobody for LysM was further tested in 
pull-down assays (Supplementary Figure SI). 

ChlP-chip 

ChIP was performed according to (34). S. solfataricus P2 
(DSM1617) cells were harvested at mid-exponential 
growth phase (OD 60 o n m ~0.6). Two biological replicates 
were used for each growth condition (with 5mM lysine 
and without lysine). For enrichment analysis, qPCR was 
performed, as described previously, with 5ng DNA as 
template. The 2- [Delta][Delta l Cq method was applied to cal- 
culate ChIP enrichment from the ChIP DNA as compared 
with input DNA (35). The lysW operator region was 
amplified with primers DC1102f and DC1103r and for 
normalization, a reference sequence in E. coli DNA that 
was spiked into all samples was amplified with the primers 
DC821 and DC822 (Supplementary Table SI). The DNA 
tiling microarray was designed and manufactured by 
NimbleGen (Roche) and sample labelling, hybridization 
and array processing were executed at NimbleGen, with 
ChIP input and output samples labelled with Cy3 and 
Cy5, respectively. 
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Table 1. Predicted binding affinities for potential/confirmed LysM-binding motifs in selected targets that were further studied by 
in vitro DNA-binding assays 
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++ 
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++ 
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++ 


42 


Sso2336 


ATACGCTAGGCTTAC 


+ 


54 
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CTTCGATACACGAAT 
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78 


Sso0155 


ATACGGGCGGAGAAT 




103 


Sso2824 


GCACGCTATTAGAGT 




158 


Sso2497 


GTGCGATTTCAGCGT 


+ 


172 


Sso0340 


CGGCGGGATTCGAAC 





Binding motifs are ranked according to their predicted K D s, and it is indicated to what extent these targets are bound in vitro: (— ) no 
binding; (+) low-affinity binding [either unstable binding (smearing) or not all free DNA is complexed at high LysM concentrations]; 
(++) high-affinity binding (all free DNA is complexed at intermediate LysM concentrations). Predicted motifs are given with indication 
of conservation of specificity-determining residues (bold). We refer to Figures 1 and 6 for the corresponding EMSA images. 



Micro-array data analysis was performed using an 
extended version of the programme described by (36), 
which uses the Ringo package of R-Bioconductor 
(37). The source code of the extended programme is 
made available through http://micr.vub.ac.be. It includes 
importing data, data quality assessment, pre-processing of 
data, identifying ChlP-enriched regions (chers) and 
determining significant differences between experiments 
with and without lysine. To avoid detection of peaks in 
the log 2 ratios due to reduced signal-to-noise ratios at low- 
intensity probes, as part of the pre-processing step, 
average log 2 intensities over red and green channels are 
calculated for each probe, and probes with resulting values 
smaller than the average minus twice the standard devi- 
ation are replaced by missing values. Next, a standard 
normalization step is performed in which Tukey's 
biweight mean across each sample's log 2 ratios is sub- 
tracted from individual log 2 ratios, and resulting values 
are averaged over replicates and smoothed over a 100-bp 
window to reduce stochastic noise and systematic noise 
due to differences in hybridization efficiency of different 
probes. 

A region was considered as ChlP-enriched if smoothed 
log 2 ratios of all reporters mapped to this region exceed a 
threshold of 2 and if the region contains at least five 
probe-matched positions, each of these positions being 
<500bp apart from another matched position within 
this region. A constant threshold of 2 was chosen rather 
than the more statistically inspired version used by (36) to 
maintain an almost constant number of detected chers 
across biological replicates, rather than ensuring that the 
number of detected false-positives is smaller than a certain 
amount. The resulting chers were extended at both sides 
with 150 bp, and overlapping chers were then combined 
into one cher. This additional step was necessary to make 
sure that a cher is only detected once: sometimes a cher 
could be detected more than once due to, for example, a 
single reporter level that is under the threshold, whereas a 
series of neighbouring reporter levels are above the 
threshold. 



Only chers that were present in both biological repli- 
cates for at least one of the experiments with and 
without lysine were taken into account for further 
analysis. To determine which chers detected in presence 
of lysine were significantly enhanced compared with the 
situation without lysine and vice versa, the same proced- 
ure to detect chers was repeated with a threshold of 1 , and 
significantly differentially enhanced chers were defined as 
those that exceeded the threshold of 2 for one set of bio- 
logical replicates and did not exceed the threshold of 1 for 
the other set. 

In silico binding site prediction 

A binding energy-weighted sequence logo for LysM 
DNA-binding sequence specificity has been calculated ac- 
cording to (38) by equating the frequency of each base to 
the relative K A for binding to a consensus variant fragment 
having the corresponding substitution divided by the sum 
of the four K A values for each of four bases at this 
position. 

For each cher sequence, the best potential binding site 
was predicted based on the energy-based position weight 
matrix, and the corresponding theoretical K A was 
calculated. This is shown in Table 1 and Supplementary 
Dataset SI. 

RESULTS 

The LysM-/js W operator interaction in S. solfatavicus 
and S. acidocaldavius 

LysM is highly conserved among all sequenced Sulfolobus 
species, and their lysine biosynthesis genes are invariably 
organized in two consecutive operons with the same 
polarity, lysYZM and lysWXJK (Figure 1A). In 
S. solfatavicus, only the latter bears a target site for 
LysM (22). To analyse functional conservation of LysM 
and its target site(s), we performed EMSAs of LysM Ss and 
its orthologue LysM Sa from 5. acidocaldarius binding to 
the lysW operator of both organisms (Figure IB and C). 
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Figure 1. Functional conservation of the LysM-lyslV operator interaction in Sulfolobus, (A) Schematic overview of genomic organization of the lys 
locus. ORFs are depicted by arrows, with names of corresponding genes mentioned above. Transcription start sites are indicated with small black 
arrows. The genomic organization shown here is identical for all sequenced Sulfolobus species, and amino acid sequence identity between 
S. solfataricus and S. acidocaldarius orthologues is mentioned below each lys gene. The lysW promoter/operator region that is subject of the 
interaction analysis is indicated by a rectangle. (B) EMSAs of binding of LysMs s to the lysW control regions of S. solfataricus (on a 203-bp 
fragment) and S. acidocaldarius (on a 188 bp-fragment), as indicated. Protein concentrations are mentioned on top of the autoradiograph. Positions 
of free DNA (F), free single-stranded DNA (SS) and protein-DNA complexes (B1-B4) are pointed out. (C) EMSAs of binding of LysM Sl , to the 
lysW control regions of S. solfataricus and S. acidocaldarius, as indicated. Notations are the same as in subpanel (B). 



The results indicate that each regulator binds the two 
operators with a similar binding affinity. However, 
binding and migration patterns showed remarkable differ- 
ences. LysM Ss formed two distinct complexes (B 1 and B2) 
with the cognate lysW operator in a concentration- 
dependent manner, and some supershifted smear at the 
highest concentration used. In contrast, binding of 
LysMss to the heterologous lysW operator from 
S. acidocaldarius resulted in the formation of essentially 
one major complex (Bl) and of a small amount of a 
second, slower migrating complex (B2). At the highest 
protein concentrations, two additional complexes (B3 
and B4) were formed that exhibit an even lower relative 
mobility. Binding of LysM Sa to both lysW operators 
appeared very similar and produced essentially one 
major complex (Bl), a tiny amount of a slower migrating 
complex (B2) and, at the highest protein concentrations, 
supershifted smear. 

Previously, a single high-affinity binding site for LysM Ss 
was identified by DNase I footprinting at low protein con- 
centrations, whereas higher concentrations resulted in 
protection of a large, but undefined, zone and in the ap- 
pearance of hyper- reactive bands (22). In-gel footprinting 
with the Cu-OP ion (Figure 2A and Supplementary 
Figure S2) demonstrated that in complex Bl, LysM Ss pro- 
tected a stretch of 16 and 15 nucleotides (nt) on the top 
and bottom strands, respectively (Figure 2F). This zone 
extends from position —59 to —74 upstream of the initi- 
ation codon (—50 to —65 upstream of the transcription 
start site) and corresponds to the upstream binding site 
in Figure 2F. The slower migrating complex B2 showed 
two distinct zones of protection, one identical to that 
observed in complex Bl, and an additional slightly 
shorter stretch of 14nt on the top strand and of 11 nt on 
the bottom strand (Figure 2A and Supplementary Figure 
S2). This second zone of LysM binding (—31 to —44) is 
located slightly downstream of the major LysM binding 
site and covers the BRE and part of the TATA box of the 
lysW promoter (Figure 2F).The centres of the principal 



high-affinity site and of the more degenerated accessory 
site are 21 bp apart; consequently, equivalent positions in 
both sites are aligned on the same face of the helix. 

The major complex Bl formed on binding of LysM Sa to 
the lysW operator of 5. acidocaldarius showed a continu- 
ous stretch of protection of 18 nt on the top strand and 
16nt on the bottom strand (Figure 2B and Supplementary 
Figure S2). This zone extends from position —55 to —72 
upstream of the initiation codon and aligns perfectly with 
the principal LysM binding site in the lysW operator of 
S. solfataricus (Figure 2F). Complex B2 did not show an 
additional clear zone of protection. Formation of this 
minor complex might therefore be due to binding of a 
higher oligomeric state of LysM Sa (without further 
contact with the DNA), to establishment of additional 
non-specific interactions, or to formation of 'sandwich- 
type' structures. 

High-resolution contact mapping of the LysM-/js W 
operator interaction in S. acidocaldarius 

Missing contact probing assays (39) performed at different 
protein concentrations (Figure 2C and F and 
Supplementary Figure S2) indicated that removal of five 
pyrimidines of the top strand (T-70, C-68, T-63, T-62 and 
C-61) and seven of the bottom strand (C-72', C-71', T-66', 
T-65', C-60', T-59' and T-58') strongly inhibits complex 
formation with LysM Sa . Similarly, removal of eight 
purines of the top strand (G-72, G-71, A-69, A-66, 
A-65, G-60, A-59 and A-58) and six of the bottom 
strand (A-70\ G-68', A-64', A-63', A-62' and G-57') 
strongly interferes with complex formation (Figure 2D 
and F and Supplementary Figure S2). Pre-methylation 
binding interference experiments demonstrated that 
methylation of three guanine residues of the top strand 
(G-72, G-71 and G-60) and three of the bottom strand 
(G-68', G-57' and G-56') strongly reduces binding of 
LysM Sa (Figure 2E and F and Supplementary Figure 
S2). Removal of any of these six guanine residues also 
strongly inhibited LysM Sa binding, except G-56'. As this 
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Figure 2. High-resolution contact probing of the LysM— ysW operator interaction. (A) In-gel Cu-OP footprinting experiment of binding of LysM Ss 
to the lysW operator region of S. solfataricus (203-bp fragment, top strand labelled). On top, the EMSA is shown with indication of used protein 
concentrations (in nM) and populations of free (F) and bound (B1-B2) DNA that were further analysed by denaturing acrylamide gel electrophor- 
esis, of which the autoradiograph is shown below. C + T and A + G indicate Maxam-Gilbert sequencing ladders. The region that is protected in both 
nucleoprotein populations Bl and B2 is indicated with a black bar; the region that is only protected in complex B2 is indicated with a grey bar. 
(B) In-gel Cu-OP footprinting experiment of binding of LysM Sa to the lysW operator region of 5. acidocaldarius (188-bp fragment, top strand 
labelled). Notations are the same as in subpanel (A). (C) Depyrimidation binding interference experiment of binding of LysM Sa to the lysW operator 
region of S. acidocaldarhis. Populations of input (I), free (Fl and F2) and bound DNA (Bl, B2 and B3) are indicated. Observed effects are pointed 
out with horizontal lines with the corresponding nucleotides mentioned. (D) Depurination binding interference experiment of binding of LysM s ., to 
the lys W operator region of S. acidocaldarius. (E) Pre-methylation binding interference experiment of binding of LysM Su to the lysW operator region 
of S. acidocaldarius. (F) Alignment of the nucleotide sequence of the lys W operator region of S. acidocaldarius and S. solfataricus with a summary of 
all results of footprinting and binding interference experiments. The lysM translational stop codon is underlined, the lysW translational initiation 
codon is in bold and the semi-palindromic LysM binding motif is boxed, as are the BRE and TATA box promoter elements [predicted based on 
transcription start site determination in (22)]. An asterisk indicates the transcription initiation site. Positions are numbered with respect to the lysW 
initiation codon of S. acidocaldarius. Protection against chemical cleavage in Cu-OP footprinting is indicated with a grey shaded region. On top of 
the sequence, a helical representation demonstrates minor and major groove orientation of LysM Sll binding. Strand regions that were protected 
against cleavage are grey coloured, while binding interference effects are depicted by following symbols: circles = depurination binding interference; 
squares = depyrimidation binding interference; triangles = pre-methylation binding interference. Open symbols represent weak effects; filled symbols 
strong effects. Subpanels (A-E) show representative experiments that were performed with DNA having the top strand labelled; experiments were 
similarly performed with DNA having the bottom strand labelled, as summarized here and shown in Supplementary Figure S2. 



position was not protected in the Cu-OP footprinting 
assay, we may conclude that G-56' is not directly con- 
tacted and that the negative effect of pre-methylation of 
this base most likely results from steric hindrance exerted 
by the methyl group on contact of LysM Sa with an 
adjacent base-specific or backbone group. These results 
confirm recognition of a semi-palindromic sequence 
centred around position —64, which is highly conserved 
between S. solfataricus and S. acidocaldarius (22) 
(Figure 2F). The binding site overlaps part of the lysM 



ORF and its translational stop codon. Furthermore, it is 
located 11 bp upstream of the predicted lysW BRE and 
TATA box. Dimethyl sulphate methylates guanine 
residues at position N 7 in the major groove. Therefore, 
we may conclude that LysM interacts specifically with 
two consecutive major groove segments and intervening 
minor groove of the operator, all aligned on one face of 
the DNA helix (Figure 2F). It is interesting to note that in 
this helical orientation, TBP and LysM are expected to 
bind to the same face of the DNA. 
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Sequence specificity of LysM binding 

Based on phylogenetic footprinting of the lysW operator 
region, a degenerate consensus sequence was derived that 
allowed the definition of a strictly palindromic 15-bp 
consensus sequence for LysM binding (Figure 3A). 
LysM Sa specifically binds a 47-bp DNA fragment har- 
bouring this consensus sequence and forms a single 
complex in EMSA (Figure 3B). The average equilibrium 
association constant (K A ) for this interaction was 
calculated to be 196 uM -1 (corresponding dissociation 
constant K D = 5nM). 

Six fragments of identical length but with the LysM 
consensus site permuted were used to analyse DNA 
bending (Figure 3C and D). All six fragments migrated 
with a similar mobility when being unbound, indicating 
the absence of a measurable intrinsic curvature 
(Figure 3C). Therefore, the role of structure specificity in 
LysM binding is limited. In contrast, LysM Sa -DNA 
complexes displayed clear differences in relative 
mobilities, which allowed calculation of an average 
apparent bending angle of 36° (Figure 3C and D). This 
result contrasts the lack of measurable DNA bending 
reported for LysM Ss using a similar assay (22). 

To determine the sequence specificity of LysM binding, 
EMSAs with LysM Sa binding to a set of 22 mutated 
variants (all possible single-bp substitutions at all pos- 
itions of one half-site) were performed (Figure 4A), and 
relative K A s were calculated (Supplementary Table S2). 
Due to full symmetry of the consensus site, only one sub- 
stitution was analysed at position 0. This analysis resulted 
in a quantitative model of binding specificity, representing 
a significant part of the complete binding energy land- 
scape. This model is graphically represented by an 
energy-normalized sequence logo (Figure 4B). The total 
information content of the LysM binding specificity is 
8.46, and therefore, LysM binds DNA with a relative 
high sequence specificity. This specificity is restricted to 
the half sites, particularly to positions —3 and —4, which 
are highly discriminative for a G-C and a C-G bp, respect- 
ively. The remainder of the sequence specificity in major 
groove recognition is largely contributed by position —7, 
displaying a preference for a G-C bp. As information 
content of individual positions is higher in half sites 
than in the central region, our analysis confirms recogni- 
tion of two major groove segments and the intervening 
minor groove (Figure 4B). These results of the high-reso- 
lution contact mapping of the lysW operator are fully 
compatible with these data (Figure 2F). Unexpectedly, 
there is no pronounced preference for weak bps at 
central positions. Generally, the contribution of the 
minor groove-recognized part of a site to sequence prefer- 
ence is low and mostly arising from a preference for A-T 
or T-A bps, which facilitate minor groove compression. 
Most likely, this situation reflects the relative modest 
LysM-induced DNA bending. This is further confirmed 
by the observation that LysM Sa binds mutant operators 
bearing an I-C at positions —1 and +1 with only a 1.3-fold 
higher affinity than mutant operators with G-C substitu- 
tions at both positions (Supplementary Figure S3). Inosine 
is identical to guanine in the major groove but lacks the 
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Figure 3. Binding and LysMs a -induced DNA bending of the consensus 
binding site. (A) Cartoon representing the LysM consensus sequence. 
Inverted repeat elements are indicated with arrows; the axis of 2-fold 
symmetry is depicted by an ellipse. (B) EMSA of binding of LysMsa to 
a DNA fragment encompassing the consensus sequence. This fragment 
was generated by hybridization of 47-nt-long complementary oligo- 
nucleotides containing the 15-nt LysM box flanked on either side by 
the 16-nt stretches that surround the LysM Sll binding site of the lysW 
operator in the S. acidocaldarius genome. Positions of bound (B) and 
free (F) DNA are indicated, as are applied LysMs a concentrations (in 
nM). (C) EMSA with permuted DNA fragments bearing the consensus 
binding site. In all binding reactions, we added 30 nM LysM Sa . 
Characteristics of fragments are further described in Materials and 
Methods, and lane numbers in the EMSA correspond to fragment 
numbers. Positions of bound (B) and free (F) DNA are indicated. 
(D) Graphical representation of the relative mobility (u) of different 
complexes as a function of the position of the insert sequence within 
the DNA fragment. The apparent bending angle (a) was calculated as 
follows: uM/uE = cos (a/2) (M = insert in the centre of the fragment, 
resulting in the lowest value of u, and E = insert at the end of the 
fragment). 
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Figure 4. Systematic bp substitution of the LysM consensus sequence 
to analyse sequence specificity of binding. In this figure, a representa- 
tive example of such a binding analysis is shown for position —4, but 
analogous EMSAs have been performed and analysed for the other 
positions (Supplementary Table S2). (A) EMSAs of LysM binding to 
variants harbouring a single-bp substitution at position —4. The bp 
change is mentioned on top of each autoradiograph, as are LysM Sa 
concentrations (in nM). DNA fragments were prepared by hybridizing 
complementary oligonucleotides. (B) Graphical representation of the 
systematic bp substitution experiment in an energy-normalized 
sequence logo. The height of the stack of letters corresponds to the 
information content (bits). 



exocyclic amino group that causes a steric hindrance for 
minor groove compression. 

Genome-wide in vivo mapping of LysM Ss binding sites 
in S. solfatavicus 

LysM binding sites were identified on a genome-wide scale 
in vivo, using ChlP-chip assays with S. solfatavicus cells 



grown in either presence or absence of exogenous lysine 
(5mM) (Figure 5). Utilization of a LysM-specific 
nanobody for immunoprecipitation avoided the need of 
overexpressing or epitope-tagging LysM Ss -In total, 73 
genomic regions distributed throughout the genome 
exhibited an enrichment of more than 4-fold (normalized 
log2 value of 2.0) in either both or one of the growth 
conditions. For several peaks, this enrichment even 
exceeded 8-fold. Resulting comprehensive dataset, 
obtained by setting the threshold at a log 2 value of 2.0, 
is given in Supplementary Dataset SI. 

These ChlP-chip assays provide first direct evidence of 
LysM Ss being bound in vivo to the lysW operator 
(Supplementary Figure S4). Peak maxima were centred 
around the characterized binding motif. By applying 
qPCR to ChIP samples, we quantified enrichment of the 
lysW operator/promoter region as being ~586- and 
83-fold in absence and presence of lysine, respectively 
(Supplementary Figure S5). Significant binding was also 
detected at the lys YZM promoter region (Supplementary 
Figure S4), which was previously shown not to be bound 
by LysM in vitro. This observation necessitates revision of 
the statement that LysM Ss is not involved in autore- 
gulation and in regulation of lysY and lysZ (22). 
However, it remains unclear whether LysM Ss is associated 
at the lysY promoter because of direct recognition of a 
binding motif, or rather indirectly through protein- 
protein interactions (see later in the text). 

Besides lys Wand lysY, 71 previously unknown LysM Ss 
targets were identified. Sequences of these ChlP-enriched 
regions were scanned with previously developed binding 
energy-based position weight matrix to predict the most 
probable binding motif (Supplementary Dataset SI). 
A comparison with the annotated genome information 
of 5. solfataricus (40) indicates that 76% of all predicted 
binding motifs are located in ORFs. Given a genome 
coding density of 84%, this percentage indicates only a 
slight over-representation of binding motifs being 
present in intergenic regions, which is unexpected for a 
specific transcription factor. 

Positions of predicted motifs located in intergenic 
regions with respect to the closest translational start 
varied significantly, but many are located between 
40 and 70 bp upstream of an initiation codon 
(Supplementary Dataset SI). There are also several cases 
in which, as for the lysW operator, the predicted motif is 
located at the 3'-end of an ORF and close to the promoter 
of a downstream gene or operon (see later, Figure 6A and 
F). Transcription units, of which expression is potentially 
influenced by LysM Ss , encode proteins with various func- 
tions, which can be classified in following categories: 
amino acid metabolism, central metabolism, transport, 
clustered regularly interspaced short palindromic repeats 
(CRISPR) immunity system, translation and hypothetical 
proteins For a distribution over the different functional 
categories, see Figure 5 and Supplementary Dataset SI. 
Curiously, LysM Ss is associated with several tRNA genes 
and CRISPR loci. Furthermore, of particular interest are 
potential target genes that function in biosynthesis and 
transport of amino acids other than lysine, some of 
which are studied in more detail later in the text. 
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Figure 5. Genome-wide distribution of LysM Ss binding sites that are bound in vivo. (A) LysM Ss binding profile across the S. solfatahcus chromo- 
some in cells in exponential growth phase grown in medium with supplementation of 5 mM lysine. The enrichment fold-ratio corresponds to the log 2 
of the signal ratio of ChlP-enriched DNA versus input DNA. Signals arising from biological duplicates are depicted in different colours (red/blue). 
Selected targets that are called in this growth condition (signals exceeding a log2 value of 2.0 in both biological duplicates), and for which binding is 
further analysed in this work, are labelled. (B) LysM Ss binding profile across the S. solfataricus chromosome in cells grown in medium without 
additional supplementation of lysine. Notations are the same as in subpanel (A). (C) Pie chart showing the percentage of LysM binding sites 
associated with genes divided in functional categories. 
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In vitro DNA-binding affinity of LysM Ss for the lysW 
operator decreases on addition of lysine (22). However, 
the in vivo LysM Ss DNA-binding profiles appeared 
similar in either presence or absence of L-lysine 
(Figure 5). Based on the main criteria used for peak 
calling, namely, that peaks exceed a log 2 value of 2.0 in 
both biological duplicates, not all peaks were called in 
both growth conditions. Nevertheless, for all peaks 
called in one condition, signals exceeded a log 2 threshold 
of 1.0 in the other condition, indicating that there is no 
significant difference in binding profile in cells grown with 
and without exogenous lysine (Supplementary Dataset 
SI). However, it should be taken into account that 
ChlP-chip profiles do not provide a quantitative 
measure of DNA binding, as demonstrated for lysW: 
whereas ChIP profiles in both conditions appear similar 
(Supplementary Figure S4), qPCR quantification, which is 
more accurate because of a higher dynamic range, 
demonstrates a difference in enrichment of >4-fold 
(Supplementary Figure S5). Consequently, despite the ob- 
servation that ChIP binding profiles are similar on growth 
in presence and absence of lysine, DNA-binding charac- 
teristics such as affinity or stoichiometry might neverthe- 
less be different. 

Validation of ChlP-chip data with in vitro 
DNA-binding assays 

In vitro DNA binding was analysed with EMSA for a 
selected subset of eight potential targets with varying pre- 
dicted binding affinities and for which predicted binding 
motifs are located either in intergenic control regions or in 
ORFs (Figure 6 and Table 1). In this selection, we 
included targeted genes that are involved in amino acid 
metabolism: Sso0684 encodes a glutamate synthase (gltB), 
Sso0977 a 2-isopropylmalate synthase (leuA-2) 
and Ssol906 and Sso2043 both encode amino acid 
transporter-related proteins. We also included the 
control region of Sso0572 that codes for a conserved 
hypothetical ATPase of the PiLT family, for which pre- 
dicted binding motif is located inside the preceding ORF, 
and of Sso2824, which encodes a formate dehydrogenase 
alpha subunit (J'dhF-2). Two targets for which LysM Ss 
binds into a coding region were also tested: these genes 
code for a hypothetical protein (Sso2336) and acyl-CoA 
dehydrogenase (Sso2497). 

With exception of Sso2336, predicted binding motifs are 
located close to peak maxima (Figure 6). In EMSAs using 
100- to 200-bp fragments comprising a sequence centred 
around the predicted motif, LysM Ss binds with high 
affinity to control region fragments of gltB, leuA-2 and 
Sso2043 (Figure 6B, C and E). Binding of LysM Ss to 
gltB and leuA-2 resulted in concentration-dependent for- 
mation of two complexes, whereas three complexes were 
formed with Sso2043. Furthermore, low-affinity binding 
yielded two complexes for Ssol906 and Sso0572 and a 
single complex for the Sso2497 ORF fragment 
(Figure 6A, D and G). The other two targets (Sso2336 
and Sso2824) formed either unstable complexes resulting 
in a smearing in EMSA or showed no complex formation 
(Figure 6F and H). In the former case, we also tested 



binding to a fragment-bearing part of the ORF of 
Sso2334 that corresponds better to the peak maximum 
of the cher, but again only weak binding resulting in 
some smearing at the highest protein concentrations was 
observed (Figure 6F). 

We performed OP-Cu footprinting experiments for 
four fragments that exhibited high-affinity binding 
(Figure 7). For binding to gltB and leuA-2 operator frag- 
ments and to the Sso2497 ORF fragment, only a single 
protein-DNA complex was analysed (Figure 7A, B and 
D). In case of gltB and leuA-2, this complex corresponds 
to the fastest migrating complex Bl (Figure 6B and C). 
Without exception, the zone of protection is confirmed to 
contain the predicted LysM binding motif (Figure 7E). 

Further analysis of three protein-DNA complexes 
formed with the Sso2043 control region indicated that in 
complex Bl, which exhibits the highest relative mobility 
and is formed at lower protein concentrations than the 
two other complexes, the predicted binding motif is 
bound (Figure 7C and E). This finding confirms the 
high-affinity nature of this site (predicted K D = 0.3 nM), 
which is rationalized by full conservation of all 
specificity-determining residues (Table 1). In complex 
B2, which migrates with a lower relative mobility, foot- 
printing indicates that LysM Ss protects a different, but 
similarly sized, region. This region is located upstream 
of the high-affinity site and contains a binding motif 
with only one good half-site (Figure 7E). As centres of 
these two sites are 25 bp apart, they are contacted by the 
protein on opposite faces of the DNA helix. In complex 
B3, both sites are protected. 

A ranking of all in vitro tested binding sites according to 
their in silico predicted K D values results in a clustering of 
sites that are efficiently bound both in vitro and in vivo on 
the one hand (K D < 54 nM) and sites that are only bound 
in vivo on the other hand (K D > 78 nM) (Table 1). The two 
categories of binding sites can be found in intergenic 
regions as well as in ORFs, and ~24% of all binding 
regions contain a binding motif with a predicted K D 
<50nM, indicating the presence of a true high-affinity 
LysM binding site (Supplementary Dataset SI). Binding 
at low-affinity sites might be stabilized in vivo on inter- 
action with other sequence-specific DNA-binding or with 
nucleoid-associated proteins. 

In vivo repression of LysM target genes by L-lysine 
supplementation 

We measured the effect of L-lysine supplementation 
(5 mM) to growth medium on expression of lysX (second 
gene of lysWXJK operon), leuA-2, gltB and Ssol906 in 
wild-type cells with qRT-PCR (Figure 8). Addition of 
L-lysine results in a significant downregulation of all four 
LysM target genes tested, with the effect being most 
pronounced for leuA-2 (10-fold reduction). Therefore, 
the expression of genes involved in biosynthesis, and 
possibly also transport, of other amino acids than lysine 
is significantly modulated in response to changes in intra- 
cellular lysine concentration. 

Although differential expression of these genes in 
presence or absence of lysine is no conclusive proof for 
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Figure 6. /« v/vo and in vitro LysM Ss binding for a selection of called ChlP-enriched regions: Sso0571 (A), gltB (B), leuA-2 (C), Ssol906 (D), Sso2043 
(E), Sso2336 (F), Sso2497 (G) and Sso2824 (H). Zoomed profiles are plotted as the log 2 of enrichment fold (y-axis) versus genomic position (.Y-axis). 
Signals arising from biological replicates are depicted in different colours, of which the code is provided in the figure inset in subpanel (A). Aligned 
with genomic positions, a schematic overview is given of intergenic and ORF regions. ORFs are depicted by open arrows and labelled. Intergenic 
regions are represented by a horizontal line. In each ChlP-enriched region, a red rectangle indicates the 15-bp sequence that is predicted to show the 
highest similarity to an LysM-binding motif. Below each binding profile, an EMSA is shown probing a fragment spanning the part of the 
ChlP-enriched region containing the predicted motif, which is indicated by a blue line [in subpanel (F), two fragments are tested called 1 and 2]. 
LysM Ss concentrations are mentioned on top of the autoradiograph. Positions of free DNA (F), free single-stranded DNA (SS) and protein-DNA 
complexes (B1-B3) are indicated. 
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Figure 7. In vitro DNA-binding analysis for newly discovered LysM Ss targets. (A) In-gel Cu-OP footprinting experiment of LysM Ss binding to the 
control region of gltB. The autoradiograph of the EMSA is shown on top of the figure subpanel, with indication of LysM Ss concentrations (in nM), 
position of single-stranded DNA (SS) and of excised DNA populations (I = input DNA, F = free DNA and B = bound DNA; indicated with 
rectangles). The autoradiograph of the denaturing gel is shown below the EMSA: A +G and C + T stand for the Maxam-Gilbert sequencing ladders, 
and I, B and F represent DNA populations. The protected region is indicated with a bar at the right side of the footprint autoradiograph. (B) In-gel 
Cu-OP footprinting experiment of LysM Ss binding to the control region of leuA-2. Notations are the same as in subpanel (A). (C) In-gel Cu-OP 
footprinting experiment of LysM Ss binding to the control region of Sso2043. Notations are the same as in subpanel (A), but in this case, there are 
three different bound DNA populations: Bl, B2 and B3. Hyper-reactivity is indicated by a ball-and-stick symbol on the left side of the autoradio- 
graph. (D) In-gel Cu-OP footprinting experiment of LysM Ss binding to the ORF of Sso2497. Notations are the same as in subpanel (A). 
(E) Sequences of probed regions, with indication of protected regions (grey shade), 15-bp binding motifs (boxed) and position of hyper-reactivity 
effect (ball-and-stick symbol). For the three control regions, the sequences are aligned according to the translational start (with indication of the 
positions on top). The transcription start site is indicated with an arrow. For the Sso2497 ORF sequence, positions are indicated on top of the 
sequence. 



LysM Ss regulation, it is likely that at least part of this 
regulatory response is direct and mediated by LysM Ss . 
This hypothesis is supported by the observation that 
binding characteristics are influenced by lysine 



concentration, as demonstrated by decreasing ChIP 
enrichment levels and binding affinities for the lysW 
operator at increasing lysine concentrations in vivo and 
in vitro, respectively (Supplementary Figure S5) (22). 
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Figure 8. Relative gene expression analysis for LysMs s target genes 
lysX, gltB, leuA-2 and Ssol906 with qRT-PCR. Relative expression 
level is given as fold change of the expression level during growth in 
Brock medium with sucrose and 5 mM lysine versus the expression level 
during growth in Brock medium with sucrose lacking amino acids. 
Standard deviations (calculated for two biological replicates) are 
indicated. 



In conclusion, it appears that LysM Ss stimulates expres- 
sion of all target genes at low intracellular lysine 
concentration. 

Cofactor specificity of LysM 

A systematic EMSA analysis of LysM Ss binding to the 
gltB operator with all 20 naturally occurring L-amino 
acids as potential effector molecules confirmed that 
lysine reduces, but does not completely abolish, binding 
of LysM Ss , as observed previously for the lysW operator 
(22) (Figure 9 A and B). Furthermore, arginine, glutamine, 
isoleucine, leucine, methionine and valine also specifically 
reduce LysM binding at 5mM. Similar results were 
obtained with lysW, leuA-2 and Sso2043 operator 
regions (data not shown). Similar as for lysine, addition 
of these six amino acids reduced binding affinity but did 
not completely abolish complex formation (Figure 9B). 
L-lysine has the strongest inhibitory effect of all tested 
potential effector molecules and therefore likely the 
highest affinity for LysM Ss (Figure 9). It was found to 
exert a rather similar effect from ~10uM up to 5mM, 
the highest concentration tested (Figure 9B and D). 

We tested a series of precursors of lysine, arginine and 
glutamine biosynthesis or analogues of lysine and arginine 
that show structural similarities with lysine (Figure 9C and 
E). The EMSA experiments indicate that D-lysine and 
L-homoarginine also exert a negative effect on LysM Ss 
binding, even inuM range (Figure 9D), and that 
homoarginine is a more potent cofactor than arginine. 
The effect of L-canavanine and L-citrulline was 
less pronounced, whereas L-ornithine and D,L-a,e- 
diaminopimelic acid had no significant effect 
(Figure 9C). Similarly, 2-oxoglutarate had no effect by 
itself and neither did it interfere with the negative effect 
of glutamine (Figure 9C). 

Combined, these results indicate that both the amino 
group of l- and D-lysine and the guanidino group of 
arginine and homoarginine can be accommodated in the 
cofactor binding-pocket. The optimal length of the 
cofactor side chain is four CH 2 -groups. 



DISCUSSION 

Lrp-like regulators are abundantly present in archaeal 
genomes (3) and appear to play an important role in adap- 
tation of cellular metabolism to variations in concentra- 
tion of amino acids as signaling molecules (2,41,42). Some 
bacterial or archaeal Lrps control only a single or a few 
target genes, whereas others regulate a vast number of 
genes involved in various pathways (1,42,43). Here we 
identify 73 binding sites for LysM and demonstrate that 
LysM is a much more versatile regulator than originally 
thought. Binding sites for LysM may occur singly or in 
combination with an auxiliary more degenerated LysM 
box, as demonstrated for lysW and Sso2043. LysM is 
not only involved in transcriptional control of lysine bio- 
synthesis, but it also modulates expression of genes 
involved in biosynthesis of leucine and glutamate and of 
genes annotated as amino acid transport-related. Clearly, 
the control of amino acid metabolism and transport is a 
primary task of LysM, although the transcription factor 
also binds in the neighbourhood of promoters expressing 
genes with a variety of other functions. 

Lysine is the main effector molecule of LysM. In vitro 
DNA binding was invariably reduced in the presence of 
lysine and in vivo, a significant downregulation was 
observed on lysine supplementation for all four tested 
targets (lysX, gltB, leuA-2 and Ssol906). We can assume 
that this regulatory response originates at least partially 
from LysM action, and it is strongly suggested that LysM 
functions as an activator. Although FLU from 
Pyrococcus sp. OT3 and LysM have similar cofactor spe- 
cificity with lysine as the major coregulator, their mode of 
action is clearly different. Whereas FLU generally works 
as a repressor of which the activity is relieved in presence 
of lysine (9), LysM appears to act as a transcriptional 
activator in absence of lysine. For lysW, leuA-2, Ssol906 
and Sso2043, the main LysM binding site is located just 
upstream of predicted BRE and TATA box elements 
(Figure 7E). This is reminiscent of activation by Ptr2, an 
Lrp-like transcription factor from M.jannaschti that binds 
at an equivalent relative position and activates transcrip- 
tion by stimulating protein-protein interactions with TBP 
(20). In the cases of Sso2043 and lysW, binding of LysM 
at a secondary accessory binding site located either 
upstream or downstream of the core binding site 
(Figures 2F and 7E) might contribute to more profound 
regulatory effects: either further activation or a switch to 
repression. In contrast, the distance between the main 
LysM binding site and the translational initiation site in 
the gltB promoter region suggests that the regulator is 
positioned in between the promoter region and initiation 
site. 

Often, Lrp-like regulators display a cofactor promiscu- 
ity (9,17,44). Here we demonstrate that, besides lysine, 
several other amino acids (i.e. arginine, glutamine, 
leucine, isoleucine, methionine and valine) reduce 
DNA-binding affinity of LysM (Figure 9). The preference 
for the most potent effector molecules, that is lysine > ar- 
ginine > glutamine, is identical as for FLU, and the struc- 
tural basis for this similarity lies in conservation of a 
glutamine at position 97 and an aspartate at position 
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Figure 9. Effect of different cofactors on LysM Ss DNA binding to the gltB control region on a 167-bp fragment monitored by EMSAs. 
(A) EMSAs to test binding in presence of each of the 20L-amino acids. Positions of free DNA (F), single-stranded DNA (SS) and LysMs s - 
DNA complexes (Bl and B2) are pointed out. In the bottom row above the autoradiograph, presence or absence of LysM Ss in the 
reaction mixture is indicated as + or — , respectively. Used protein concentration is 250 nM. In the top row above the autoradiograph, the three-letter 
code of the added amino acid is displayed (final concentration 5 mM). (B) EMSAs in which a concentration gradient of a selection of amino 
acids was tested. Notations are similar as in subpanel A; amino acid concentrations are given in mM. (C) EMSAs to test the effect of the 
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121. Side-chain oxygen molecules of these residues form 
hydrogen bonds with the lysine side chain, thereby 
determining ligand specificity (45). An arginine-FLll 
cocrystal structure demonstrates that arginine is less effi- 
cient because of the longer side chain causing conform- 
ational changes in the protein (45), and we can assume 
that a similar situation occurs in LysM. The stronger 
effect of homoarginine observed for LysM can be 
explained by its larger structural similarity to lysine 
(4 CH 2 -groups preceding an N-atom that may engage in 
H-bonding) (Figure 9E). Competitive binding of different 
amino acids, each with a different affinity, will allow an 
adapted regulatory response to the nutritional state of the 
cell. This state is reflected by concentration ratios of dif- 
ferent amino acids rather than by the absolute concentra- 
tion of a single amino acid type. Furthermore, a broad 
cofactor specificity is also rationalized by the observation 
that LysM regulates not only lysine biosynthesis but also 
biosynthesis and transport of other amino acids. 

All LysM Ss targets were bound in vivo in growth condi- 
tions with and without lysine supplementation, whereas 
enrichments quantified by qPCR, in vitro binding affinity 
of LysM and promoter activities were invariably lower in 
the presence of lysine. This indicates that ChlP-chip is not 
a valid quantitative measure of target site occupancy and 
likely reflects the observation that lysine reduces, but does 
not completely abolish, binding of LysM, an observation 
that was made with all six tested intergenic and intragenic 
binding sites (lysW, gltB, leuA-2, Ssol906, Sso2043 and 
Sso2497, Figure 9 and data not shown). It is also remin- 
iscent of the effect of L-leucine on DNA-binding capacity 
of E. coli Lrp (46,47), although in this case, ChlP-chip 
DNA-binding profiles were clearly sensitive to the 
presence of exogenous leucine (16). The mechanisms 
underlying the negative effect of ligand binding on 
DNA-binding affinity of LysM are not yet understood. 
In some instances, such as for E. coli Lrp and the 
archaeal FLU, cofactor binding influences the association 
state of the Lrp-like protein, thereby shifting the equilib- 
rium between different oligomeric forms that each have 
different DNA-binding characteristics (45,48). However, 
this is not a general rule and ligand binding might also 
induce more subtle conformational changes (6,8). 
Preliminary data indicate that this is the case for LysM, 
as lysine does not affect the tetrameric state of the regu- 
lator in solution (22). Possibly, effector binding reduces 
the DNA-binding capacity of LysM by re-orienting two 
helix-turn-helix motifs with respect to successive major 
groove segments of the operator. Besides diminishing 
DNA-binding affinity, this re-orientation could equally 
modify the position of the C-terminal surface responsible 



for transcription regulation with respect to elements of the 
basal transcription apparatus, thereby affecting regulatory 
outcome. 

LysM does not appear to control expression of other 
transcription factors. Nevertheless, it cannot be excluded 
that LysM is involved in regulatory networking. This situ- 
ation might occur at targets that are efficiently bound 
in vivo but not in vitro, as demonstrated for Sso2824 
and Sso2334-2336. Such differential binding is generally 
ascribed to involvement of co-regulators, other transcrip- 
tion factors or nucleoid-associated proteins, which may 
enhance binding affinity and/or stability at sites with a 
low intrinsic affinity for the regulator. In a proteome-wide 
study of protein-protein interactions in Pyrococcus 
horikoshii OT3, hetero-interactions have been identified 
for two Lrp-like proteins (49). Additionally, FL9 and 
DM1 from Pyrococcus OT3, a full-length and truncated 
Lrp protein, respectively, have been shown to form 
hetero-octamers (44). Possibly, LysM is capable of estab- 
lishing such hetero-oligomeric interactions, thereby largely 
expanding its regulatory power, both with respect to 
effector response as to target gene repertoire. Whether 
LysM exerts autoregulation is not entirely clear. In vivo, 
LysM associates with the lysYZM promoter region, but 
we did not detect binding in vitro, and northern blotting 
indicated that production of lysY and lysM mRNA was 
not affected on addition of lysine (22). 

A large fraction (76%) of LysM-binding sites is located 
inside translated regions. Eleven of these are predicted to 
harbour a high-affinity bona fide LysM-binding motif (the- 
oretical K D < 50 nM). Such a high frequency of intragenic 
binding sites is unusual for bacterial transcriptional regu- 
lators (16,50-52), with E. coli RutR as an exception (53). 
For archaeal transcription factor TrmB of Halobacterium 
salinarum NRC-\, 40% of binding sites are located inside 
coding sequences (54). Possibly, intragenic binding is more 
common for archaeal than for bacterial transcription 
factors, given their more compact genome organizations 
and smaller average intergenic region lengths (55). Indeed, 
some LysM Ss targets, for example the lysW and Sso0572 
operators, appear to be true promoter-associated regula- 
tory sites while located in the 3 -end of the preceding ORF 
sequence. Other intragenic sites might also have regula- 
tory functions for as yet undetected transcription units, 
given the flexible transcriptome architecture of 
S. solfataricus with a high abundance of conditionally 
active initiation and termination sites inside operons and 
of small RNAs (56,57). Alternatively, these binding sites 
do not function in direct transcription regulation but serve 
to control the intracellular concentration of free regula- 
tory protein, or they merely occur by chance without any 



Figure 9. Continued 

following molecules: D-lys = D-lysine; L-lys = L-lysine; L-gln = L-glutamine; 2-OG = 2-oxoglutarate; DAP = D,L-a,s-diaminopimelic acid; L-arg = l- 
arginine; L-can = L-canavanine; L-homoarg = L-homoarginine; L-cit = L-citrulline; L-orn = L-ornithine. Notations are similar as in subpanel (A and 
B). All cofactor concentrations are displayed in mM. For binding reactions to which L-glutamine and two-oxoglutarate are added simultaneously 
(indicated by 'L-gln + 2-OG'), shown concentrations correspond to L-glutamine, whereas 2-oxoglutarate had a final concentration of 5.0 mM in both 
binding reactions. (D) EMSAs to test low concentration gradients for a selection of cofactors. Notations are similar as in subpanel (A, B and C). 
However, in this subpanel, cofactor concentrations are displayed in uM. (E) Molecular structures of LysM effector molecules (except for 
branched-chain amino acids) tentatively ranked according to the strength of their inhibitory effect. The shaded atom in the L-lysine and D-lysine 
structure indicates a stereochemical difference at this position. 
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functionality and have not been removed by evolution, as 
was postulated for RutR (53). 

LysM Ss is associated with both control regions of genes 
encoding amino acid transporters (Ssol906/Sso2043) in 
paralogous highly conserved gene clusters that also 
contain a glutamate dehydrogenase located downstream 
of the transporter gene and a divergently transcribed 
allantoin permease. The Ssol906 gene cluster is flanked 
by a transposase, suggesting that gene duplication was 
mediated by a transposition event (58). The intergenic 
regions are highly conserved without a selective pressure 
for conservation of the LysM-binding motifs (data not 
shown). In vitro, LysM Ss binds with a significantly 
higher affinity to the promoter region of the ancestral 
Sso2043, which contains a binding motif corresponding 
perfectly to the consensus sequence, than to the Ssol906 
promoter, in which two mismatches have originated at 
specificity-determining positions (Figure 6 and Table 1). 
Moreover, the accessory binding site identified in the 
Sso2043 control region is lost in the Ssol906 control 
region. Therefore, although our ChlP-chip data indicate 
that this LysM Ss regulatory interaction has been inherited 
after the gene duplication event, differences in binding 
affinity might lead to differential regulation. As a conse- 
quence, the duplicated genes might respond slightly differ- 
ent to metabolic needs, thereby placing a selection 
pressure on maintenance of the duplicated gene cluster 
and on further degeneration and possibly even on the 
future loss of the LysM Ss -binding motif in the Ssol906 
promoter region. Our analysis has provided a 'snapshot' 
of the evolutionary expansion of the LysM Ss regulon. 

This work illustrates the power of genome-wide ChIP 
approaches to obtain a global view of the full-range target 
site distribution of a transcription factor, but also stresses 
the necessity to couple such techniques to in-depth studies 
for a correct interpretation and full understanding of the 
physiological role of the regulator in the cell. 
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